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Abstract 


Binaural or “virtual acoustic” representation has been proposed as a method of analyzing 
acoustic and vibro-acoustic data. Unfortunately, this binaural representation can require 
extensive computer power to apply the Head Related Transfer Functions (HRTFs) to a 
large number of sources, as with a vibrating structure. This work focuses on reducing the 
number of real-time computations required in this binaural analysis through the use of 
Singular Value Decomposition (SVD) and Equivalent Source Reduction (ESR). The 
SVD method reduces the complexity of the HRTF computations by breaking the HRTFs 
into dominant singular values (and vectors). The ESR method reduces the number of 
sources to be analyzed in real-time computation by replacing sources on the scale of a 
structural wavelength with sources on the scale of an acoustic wavelength. It is shown 
that the effectiveness of the SVD and ESR methods improves as the complexity of the 
source increases. In addition, preliminary auralization tests have shown that the results 
from both the SVD and ESR methods are indistinguishable from the results found with 
the exhaustive method. 


Introduction 

Improvements in digital acquisition and sensor technology have led to an increase in the 
ability to acquire large data sets. One potential way of analyzing structural acoustic data 
is the creation of a three-dimensional audio-visual environment. For example, an audio- 
visual virtual representation of the inside of the space station could allow designers and 
astronauts to experience and analyze the acoustic properties of the space station. 
Incorporating the visual aspect with the acoustic allows the user to sift rapidly through 
noise data to determine what parts of a vibrating structure are radiating sound to what 
areas in an acoustic enclosure. 

Unfortunately, the three-dimensional (or binaural) simulation of structural data is not 
currently feasible because of the large number of computations required to analyze a 
distributed source. The objective of this research has been to reduce the number of 
calculations required to calculate the binaural signals of an acoustically large source. 

Both pre-processing and real-time reduction methods will be discussed. 

Binamal literally means “two ears”, and is a term used to describe three-dimensional 
sound [1]. Humans detect the direction of sound using the differences in the sound that is 
heard by each ear. For example, a sound from the right will be louder in the right ear 
than in the left. This soimd level difference is termed Interaural Intensity Difference 
(IID) and is one of the most important aural cues for localizing sound. Another 
localization cue is the Interaural Time Difference, or ITD [2]. This is the difference in 
the amount of time taken for a sound to reach each ear. In the preceding example, the 
sound coming from the right of the head is heard by the right ear less than 1ms before 
being sensed by the left ear. This small ITD is sensed by the brain and used to determine 
the location of the sound source. 
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IID and ITD are mathematically described in the transfer function between a sound 
source and the left and right ears, termed the Head Related Transfer Function (HRTF). 
HRTFs describe not only IID and ITD, but also how sound is affected by its interaction 
with the head, torso, and ear. For example, as sound approaches the head, low frequency 
sound wraps around the head, reaching both ears. However, with high frequencies a 
shadowing effect occurs, causing the high frequency content to be sensed on one side of 
the head and not the other. Sound also reflects off the subject’s torso, causing time and 
intensity changes. The outer ear also affects the sound entering the ear, as its shape 
causes the sound to resonate at certain frequencies. Head Related Transfer Functions can 
be used to calculate the binaural signals at a listener’s ears, thereby simulating a three- 
dimensional sound field around a subject’s head. [1] 

Unfortunately, the application of the HRTFs is computationally expensive, and restricts 
real-time binaural analysis of a large number of sources (as with a vibrating stmcture). A 
great deal of research has been done on the modeling of HRTFs including Principal 
Components Analysis [3, 4], Karhunen-Loeve Expansion [5, 6], Balanced Model 
Truncation [7], State-Space Analysis [8, 9], Representation as Spherical Harmonics [10, 

1 1], Pole-Zero Approximation [12, 13], and Structural Modeling [14, 15]. Duda [16] 
presents a comprehensive summary of these different HRTF models, which have been 
researched for several reasons: to better understand the properties of HRTFs, to reduce 
the number of measurements required in order to create a full set of individualized 
HRTFs, and (as with this research) to reduce the number and complexity of real-time 
calculations. This research uses Singular Value Decomposition (SVD) to model the 
HRTFs, which is based on series expansion, as are Principal Components Analysis and 
Karhunen-Loeve Expansion. 

This research also uses the Equivalent Source Reduction (ESR) [17] in order to simplify 
the binaural simulation of a distributed source. This research combines the ESR and 
SVD methods and applies them to the binaural simulation of structural acoustic data, 
which is a new research area in binaural acoustics. 

The theory behind the binaural simulation of acoustic data is discussed in the first section 
of this report. From there, the current technology for calculating the binaural signals is 
examined in the “Exhaustive Method” portion of the report. Also included in the 
Exhaustive Method section is a description of how each step of the exhaustive method 
was verified by comparing the results with actual test data. The next section, termed 
“Reduction Methods”, details the investigation of the proposed methods for reducing the 
computation time of the exhaustive method. Included in the Reduction Methods section 
are the discussions of SVD, ESR, and the estimated reduction in computation time 
resulting from the use of SVD and ESR. Lastly, conclusions are drawn regarding the 
effectiveness and potential use of the reduction methods in real-time structural acoustic 
analysis. 
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Theory 


In order to discuss the exhaustive and reduced methods of simulating the binaural signals 
associated with structural acoustics, there are several topics that must be understood and 
discussed. The first is a discussion on how sound radiates from a monopole source. 
Following will be descriptions involving HRTF’s use in binaural simulation, including: 
the binaural simulation of sound radiating from a monopole; how the implementation of 
HRTFs changes with sampled signals rather than analog ones; the binaural simulation of 
multiple sources; and the binaural simulation of a vibrating plate. 

Radiation from a Monopole 

The simplest acoustic source is that of a pulsating sphere, also known as a monopole 
(figure 1). According to Fahy’s, Engineering Acoustics, the sound pressure at a radial 
distance, r, caused by the radiation from a monopole source is defined according to 
equation 1 [18]; 


p{r,t) = 


>0 ^ 

47ir J dt 


Qt- 


( 1 ) 


where po is the density of air and Q is the volume velocity. Q is defined according to 
equation 2 [19]: 


Q = Su (2) 

where S is the surface area of the sphere (dtiro^) and u is the velocity of the surface of the 
sphere. 

In the case of a vibrating hemisphere rather than a vibrating sphere, the pressure at some 
distance, r, will be 


p{r,t) 


^ 2m' y dt 

f f 

a 

, 27tr y I 


QV 



( 3 ) 

( 4 ) 


where a is the acceleration of the surface of the pulsating sphere (figure 1). This means 
that the sound at point R (which is a distance, r, from the pulsating sphere) will have a 
magnitude that varies inversely with the distance, r, and a time delay equal to the time it 
takes for the sound to reach point R, which will be r/c. 
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Figure 1: Pulsating Sphere 


Head Related Transfer Functions (HRTFs) 

As mentioned in the introduction, FlRTFs are a set of transfer functions that describe how 
sound is reflected off the human torso, wraps around the head, and resonates in the inner 
and outer ear. Because the human body is not axisymmetric, the HRTFs vary depending 
upon the elevation angle, (f), and the circumferential angle, 6, of the sound source. Since 
every human torso, head, and ears are different, each person has their own individual 
HRTFs. A generalized set of HRTFs can be obtained using the Knowles Electronic 
Manikin for Acoustic Research (KEMAR). The KEMAR manikin (figure 2) has a head, 
torso, and ears that are specifically shaped in order to represent the average human form. 



Figure 2: Knowles Electronic Manikin for Acoustic Research 




In the time domain, the HRTF represents the impulse response between a source of sound 
and a human’s ears. The convolution integral used to find the sound pressure at the left 
ear, pL,e,(ih due to a monopole source is shown below in equation 5 (figure 3b). The 
pressure, ^ resulting Ifom a monopole at angles 0, ^with respect to the head, 
represents the pressure at the center of the head if the head were not present (figure 3a). 

Pl,64 (0 = ^hrtfi^^^ (T)p,^ (t - T)d T ( 5 ) 

where 0 is the circumferential angle between the center of the head and the source, 

(f) is the elevation angle between the center of the head and the source, 
hrtfi is the head related impulse response for the left ear, 
and Tis a temporal variable. 




Note that because the pressure pe, <i>, is used, the HRTF itself does not contain the 1/r 
attenuation or the r/c delay. A similar equation can be written to describe the response at 
the right ear. The angles, 6 and ^ are defined according to figure 4. 




Figure 4; ^and ^z>are measured from the center of the head 
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Using Sampled Signals 

Equation 5 describes the process for finding the sound at the average human’s left ear for 
a continuous signal from a monopole source. Since the binaural signals are calculated 
using a computer, the convolution is actually made up of discrete signals rather than 
continuous ones. This results in the convolution shown in equation 6. 


M-l 

PlM W = Z ^^fL,04 

j=0 

Here M is the number of samples used in the HRTF and i is the discrete time step. 

In addition, the HRTFs are defined and cataloged only at discrete angles. In this analysis 
the authors used generalized HRTFs that were recorded at the ears of a KEMAR dummy 
head (Pinnae type DB-065) by researchers at MIT. These HRTFs can be found at 
http://web.media.mit.edu/~kdm/hrtf.html [20]. The HRTFs in the MIT set are defined 
from -40° to 90° with an elevation step of 10°. Since the HRTFs are defined spherically, 
the degree step in the circumferential direction is ditferent for each elevation. Figure 5 
shows the different circumferential angle steps for each elevation. (Note: the 
circumferential angle steps are in purple and the elevations are in red.) Further note that 
the MIT HRTFs are not a minimum phase representation, that is, the delay between the 
right and left ears are contained in the HRTF itself, rather than in a delay look-up table. 
This has relevance in the preprocessing required in the SVD reduction method. 



Figure 5: Angles for which MIT’s HRTFs are defined and cataloged 
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Multiple Sources 

Having determined how to find the binaural signals resulting from a single source, it is 
important to find the binaural signals resulting from a group of sources. Equation 6 is 
linear; therefore, linear superposition can be implemented. Equation 7 shows the total 
pressure at the left ear resulting from N number of sources: 

N N M-l 

Pl \i] = Z [/■] = Z Z 

n=l n=l ;=0 

An example of this is shown below in figure 6. The pressure at the left ear due to each of 
three monopole sources is summed to equal pi. This linear superposition process can be 
applied to any number of sources. 



Figure 6: Three monopoles radiating sound to the left ear 

Modeling a Vibrating Plate 

The goal of this research is find efficient ways to analyze stmctural acoustic data through 
binaural simulation. To develop these efficient methods, the authors have chosen to 
focus on the analysis of a vibrating plate. This plate is set in an infinite baffle and 
radiates sound into the free field. This structure and configuration was chosen for two 
reasons: 1) the plate is a relatively simple structure to analyze, allowing the research to 
focus on the binaural processes rather than structural effects, and 2) a plate in this 
configuration is essentially the same as a plate secured in a baffle, radiating sound into an 
anechoic chamber, which can be arranged in a typical acoustic laboratory. 
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It is a common practice among acousticians to replace a vibrating structure with an array 
of vibrating monopoles [18,19], The spacing of the monopoles must be less than half the 
structural and acoustic wavelengths. Figure 7 shows the sound vibrating from an array of 
monopole sources, which are used in place of the vibrating plate. This approximation can 
also be extended to more complex structures should they be considered in future work. 



Figure 7 : The sound radiated to the left ear by a vibrating plate 


Exhaustive Method 

The computer process used to calculate the binaural sound radiated from a vibrating 
structure has been termed the “exhaustive” method because it is based on the current 
technology and requires a large number of computations. A more detailed description 


and a schematic of the exhaustive method will be shown in the first sub-section. 
Following that will be two sub-sections on verification of two parts of the exhaustive 
method: the radiation model and the HRTF application. Following will be a discussion 
of the consistency of the tests that were used to compare with the results of the exhaustive 
method and a wavenumber decomposition to determine the frequency range for which 
that data is acciuate. 

Schematic of the Exhaustive Method 

The following is a basic schematic of the exhaustive method (figure 8). The vibrating 
plate mentioned in the Theory section is modeled as N number of monopole sources and 
the acceleration for each soiuce is defined. The radiation model (equation 4) applies a 
time delay and gain to the acceleration at each node in order to define the pressure in 
space caused by each of the vibrating monopoles. Head Related Transfer Functions 
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(HRTFs) for the left and right ears are then convolved with these pressures (equation 6) 
to get the pressure at each ear due to each monopole. These pressures are then summed 
to obtain the same binaural signals as that of someone hearing the sound of a vibrating 
plate in a free field. 



Time Delay N Filters 

And Gain 


Figure 8: Schematic of the exhaustive method 

Verifying Radiation Model 

It is important to note that the acceleration of each monopole source can be measured in a 
laboratory by placing an accelerometer in the same position as the replacement monopole 
source. This and other data was collected at NASA Langley by Grosveld [2 1 ] from a 
1.415m X 1.415m x 4.9mm aluminum panel (subjected to mechanical excitation) that was 
mounted in a transmission-loss facility. Also included in the data set are the transfer 
functions: from the input force to a microphone in the an echoic chamber, and from the 
input force to two microphones in the left and right ears of NASA’s KEMAR dummy 
head. The measured plate accelerations were used as input into the exhaustive code 
(figure 8), the radiated pressure was compared to the microphone transfer function, and 
the output was compared with the binaural recordings. In this way, each step of the 
exhaustive method was verified. 

The velocity of the plate at each of the 529 nodes was the first set of measurements to be 
taken. This data was collected using a laser scanning vibrometer (figure 9) and was 
saved as a transfer function between the input force and the velocity of each node on the 
plate. This data was collected for two different force positions on the plate. 



Figure 9: Laser scanning vibrometer 
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The following plot (figure 1 0) is the sum of the squares of the plate velocities, assuming a 
white noise force input. Notice that the frequency response has very sharp resonance 
peaks. This is because the test plate was very lightly damped. In fact, this plot actually 
underestimates the damping because exponential windowing was added to the vibrometer 
measurements [21]. 


Squared Plate Velocities 



Figure 10: Sum of the squares of the plate velocities 


The low damping level of the test plate is also seen in the transfer fimction (figure 1 lb) 
from the input force to a microphone at the position shown in figure 1 1 a. 


Measured Pressure at Test Point Resulting from Vibration of Plate 




Figure 11a; Microphone 
configuration in relation to plate 
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Figure 1 lb: Measured transfer 
function at test point 





This allows a direct comparison (figure 1 2) between the calculated and measured transfer 
functions from the input force to a microphone (as in figure 1 la). The calculated 
pressure was found through the exhaustive method — with measured plate velocities used 
as the input. This calculation was performed in the frequency domain, but is equivalent 
to the equations presented in the Theory section. 


Pressure at Test Point Resulting from Vibration of Plate 



Figure 12: Comparison of measured and calculated radiated pressures 

The measured and calculated transfer functions in figure 12 are satisfactorily similar. 
Notice, however, that there are two basic types of inconsistencies seen in this 
comparison; there are differences in the heights of the peaks, and there are discrepancies 
at low pressure levels. The differences in peak heights signify a difference in the amount 
of damping measured by the scanning vibrometer and the microphone. As mentioned 
before, this is due to the exponential windowing that was applied only to the vibrometer 
measurements and to the extremely low damping on the plate, which is difficult to 
measure precisely. There are also some discrepancies at low dB, but these are natural 
occurrences due to low system input level. As mentioned in the discussion of the 
exhaustive method, the radiation model simply adds a time delay and gain to the signal. 

It is important to notice from this comparison that the magnitude of the measured and 
calculated frequency responses are the same. Looking at a comparison of the measured 
and calculated pressures in the time domain will allow a check of the time delay. 

The following (figure 13) is a comparison of the calculated and measured impulse 
responses from the input to a microphone about Ini from the plate (as in the frequency 
domain comparison above). While this view of the impulse response is too broad to see 
the initial time delay, it allows examination of the end of the impulse response. Notice 
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that the measured response rings longer than the pressure calculated using the exhaustive 
method. This is expected because of the damping level difference between the two 
transfer function measurements. 


Filtered IR from the Input Force to a Microphone 



Time (sec) 

Figure 13: Extended time history of the impulse response 

A closer view of the initial part of this impulse response (figure 14) shows good 
correlation between the time delay and magnitude of the impulse responses. This is 
significant because if there were any errors in the radiation model, there would be 
differences between the time delay and gain of the calculated and measured responses. 
The closely matching delay and magnitude verify that the radiation model part of the 
exhaustive method is correct. 
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Verifying HRTF Application 

In order to check the HRTF application part of the exhaustive method, the authors 
compared measured and calculated transfer functions between the input force and the 
binaural signals at the ears of a dummy head. The binaural signals were calculated with 
the plate velocities that were measured with the laser vibrometer as input to the 
exhaustive code (as with the radiation model check). The following discussion details 
how these calculated binaural signals compare to signals that were actually measured at 
the ears of a KEMAR dummy head. 


Figure 1 5b shows the measured signal at the left ear of the KEMAR head compared with 
the left binaural signal calculated in the exhaustive method. As with the verification of 
the radiation model, there is good correlation between the time delay and magnitude of 
the measured and calculated impulse responses between the input force and the left ear 
(as shown in figure 15a). 



Filtered IRfrom the Input Force to the Left Ear 



Figure 1 5a: Head orientation Figure 1 5b: Impulse response ft om input to left ear 

Measurement Consistency 

All of the above comparisons and validations are dependent upon the reliability of the 
collected data. The following plot (figure 16b) shows the transfer function from the input 
force to a reference microphone (positioned according to the configuration in figure 1 6a) 
during two separate tests. It is important to note that the position of the microphone was 
not changed at all between tests. Notice that the primary difference between the two 
measurements was simply the estimate of the damping. This is as expected because of 
the lightly damped nature of the plate and has been accounted for in the process of 
verifying the exhaustive method. As in previous comparisons, there are also some 
discrepancies at low dB, and this is also expected because there is a low signal to noise 
ratio at these frequencies. 
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Transfer Function from the Input Force to the Reference Microphone 


panel 


F 4 


baffle 


? 


Figure 16a; Reference 
microphone configuration 



Figure 16a: Reference 
microphone transfer function 


Wavenumber Decomposition 

After verifying the consistency of measurements, the authors used wavenumber 
decomposition [18, 19] to find the frequency range for which the data is accurate. The 
plate was measured every 0.06m (figure 17a), which corresponds to a sample 
wavenumber of fc = 27t/A, =102 rad/m. Spatial aliasing will occur if the wavenumber on 
the plate is greater than half the sampling wavenumber. Figure 1 7b shows the 
wavenumber decomposition of the plate velocities at 550 Hz. At this frequency, the 
waves on the plate have wavenumbers mainly around k ~ 22 rad/m, where k is defined 
according to equation 8: 


k = 



co^m 


-|l/4 


D 


( 8 ) 


Here D is defined as: 


D = 


Eh^ 

12(1 -v^) 


( 9 ) 


where E is the modulus of elasticity of the plate, h is the plate thickness, and v is the 
poisson’s ratio of the plate. 

Notice that the acoustic wavenumber, ka = fo/c (which defines the wavenumber 
components that actually radiate sound from the plate) is only 10 rad/m, which is smaller 
than the stmctural wavenumber. This will become important in the discussion of 
wavenumber filtering later in the text. Figures 17c and 17d show the wavenumber 
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decomposition of the plate at 2000 and 4000 Hz respectively. In figure 17d aliasing is 
very visible, but at 2000 Hz (figure 17c) the bending wavenumber is equal to half the 
structural wavenumber, proving that the data is free of aliasing at frequencies up to 2000 


Hz. 



Figure 17a: Plate configuration 


Frequency = 550Hz 



decomposition at 500 Hz 


Frequency =2000Hz 




Figure 17c: Wavenumber 
decomposition at 2000 Hz 


Figure 17d: Wavenumber 
decomposition at 4000 Hz 
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Reduction Methods 


Having completed and verified the exhaustive method of calculating the binaural signals 
resulting from a structural acoustic source, the next step was to examine various methods 
for reducing the computing time and power required to make these calculations. Several 
reduction methods were examined, including: Wavenumber filtering. Equivalent Source 
Reduction, and Singular Value Decomposition. Of the three, wavenumber filtering was 
not used in this research. However, both Equivalent Source Reduction and Singular 
Value Decomposition were quite useful in reducing the amount of computations required. 
All three of these methods will be discussed, and a comparison will be made between the 
computations required in the SVD, ESR, and exhaustive methods. 

Wavenumber Filtering 

Recall the discussion concerning wavenumber decomposition in the exhaustive method 
section. In figure 18, the radiating components are defined as corresponding to the 
wavenumbers that are less than the acoustic wavenumber (in both the x and y direction). 
The theory behind wavenumber filtering is that since only some of the components 
radiate sound, for acoustic purposes it is only necessary to perform computations using 
these components [18, 19]. 


Frequency = 550Hz 



-50 0 50 

Kx (rad/m) 

Figure 18: Wavenumber Decomposition of plate at 550 Hz 


Wavenumber filtering (figure 1 9) is accomplished by first transforming velocity 
information from the spatial domain into the wavenumber domain via the Fourier 
transform. This signal is then low-pass filtered (with the cutoff at the acoustic 
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wavenumber) to isolate the radiating components. Next, the inverse Fourier transform is 
performed in order to transform the signal into the spatial domain. Finally, the signal is 
re-sampled at a lower “sample rate” (in quotations because sample rate typically refers to 
the time domain, whereas this analysis is done in the wavenumber domain, as previously 
mentioned). 



FFT 




Figure 19: Wavenumber filtering 



There are two main problems with this approach. First, the resampling process causes 
apparent sources to appear outside the actual edges of the plate. This is visible in the 
bottom right plot in figure 19, in which sources of velocity are seen as far as 2m from the 
center of the plate, which is outside the edges of the 1.4m by 1.4m plate. Second, 
wavenumber filtering can only be utilized for relatively simple geometries, such as a flat 
plate. For preliminary analysis, that is not an issue, but for more complex vibrating 
structures, such as the walls of the space station, wavemunber filtering will not be 
applicable. Since the future applications of this research include more complicated 
structmes, wavenumber filtering will not be used. Flowever, this is essentially what the 
equivalent source reduction method (next section) will achieve. 

Equivalent Source Reduction 

Another reduction method that was investigated is Equivalent Source Reduction. ESR 
effectively performs wavenumber filtering, but it can be used for more complex sources 
than wavenumber filtering and it does not create virtual sources outside the edge of the 
panel. Equivalent Source Reduction reduces the number of sources that are input into the 
exhaustive method (that is, N will be smaller in the exhaustive method schematic, figure 
8 ). 


19 



Similar to wavenumber filtering. Equivalent Source Reduction reduces the number of 
computations by replacing the measurement points on the panel with a fewer number of 
equivalent sources. The equivalent sources are driven such that the radiated sound 
closely matches the original field. This is accomplished by creating an evaluation surface 
that is evenly covered by an array of Q evaluation points. For the case of a baffled plate 
radiating into a free field the evaluation surface is simply the far-field pressure (figure 
20). At a single frequency, the far-field pressure can be described by a g length vector, 
p, at the evaluation positions. 


p = Ta 


( 10 ) 
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Figure 20: Evaluation surface of a baffled plate radiating sound into a closed room 

The pressure is due to the acceleration of the N sources on the surface of the plate 
described by the vector a. The Qhy N matrix of transfer functions, T, can be calculated 
from the frequency domain version of equation 4 [18,19]. The objective is to drive the Ne 
equivalent sources such that they accurately re-create the pressure at the evaluation 
surface. 


The re-created pressure can also be calculated in a matrix form from the accelerations of 
the equivalent sources, as. The Q by Ne transfer matrix, Te, is also calculated using the 
frequency domain version of equation 4. In order to minimize the differences between p 
and pe a cost fimction J is developed. 

•^ = (P-Pe/(P-Pe) (12) 
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This turns out to be a quadratic matrix problem with a unique minimum when [17, 22], 

aE=[T"Tj'T“Ta = Wa ( 13 ) 

The superscript H denotes the Hermitian or conjugate transpose. The term [t“Tj. \ 't« is 
often called the pseudo-inverse of Te. This is performed at all of the frequencies of 
interest to create a matrix of filters, W, (in the frequency domain) that transform the 
accelerations, a, into the reduced set of accelerations, as. By transforming the matrix of 
filters, W, into the time domain, the set of N time domain accelerations can be filtered 
into the reduced set of Ne equivalent accelerations (or sources). This can be done in a 
pre-processing stage before any real-time simulation is carried out. 

For this technique to work well there must be (i) a sufficient number of evaluation 
positions (2>N) and (ii) there must be at least two equivalent sources per acoustic 
wavelength whereas the measurements need to be every half of a structural wavelength 
(figure 21). Therefore, this technique will only be useful when the waves on the structure 
are predominantly subsonic (i.e. shorter than the acoustic wavelengths) as with the test 
plate below 2000Hz. It should be noted that the recreated acoustic field does not match 
the near-field of the original source accurately and will not produce good results close to 
the panel. However, virtual acoustic applications, in general, do not take into account the 
refractive effect of the head close to a source of a sound or the effect of near sources on 
the HRTFs. Therefore, it is thought that some inaccuracy can be tolerated in these 
circumstances. 


Acoustic 

wavelength Structural 




Figure 21 : The equivalent source spacing depends on the acoustic wavelength 
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More generally, ESR can be applied to a more complex geometry by considering the 
velocity at the evaluation surface. Consider a closed surface S inside of which is an 
evaluation surface Se (see figure 22). The particle velocity normal to the surface, Se, due 
to the original set of sources of acceleration, a, is given by the Q length vector, v. If a set 
of equivalent sources, of acceleration ue, is driven such that it creates a particle velocity, 
ve, normal to the surface, Se, that is the same as the velocity v, then the acoustic field 
inside the surface, Se, will be the same as the original field. This can be deduced from 
the Kirchoff-Helmholtz equation [18, 19, 23]. 


v = Da 

(14) 


(15) 

•7 = (v-Vj,)"(v-Ve) 

(16) 



Figure 22: Evaluation surface enclosed within a closed radiating surface 

In this example the velocity at the evaluation surface now becomes a function of the 
dynamic (or modal) behavior of the space and this is taken into account in the calculation 
of the transfer matrices D and De. The optimal set of equivalent source strengths can be 
calculated using the pseudo-inverse as, 

aE=[DEDEf'DEDa (17) 

This method will be effective as long as the listener in the simulation is inside of the 
surface, Se (this is similar to the near-field problem discussed before). In this example 
there is no restriction on where the equivalent sources are placed as long as they he 
outside of the surface Se. The calculation of the matrices D and De can be undertaken 
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using boundary element methods or other equivalent source modeling techniques. The 
ESR method can also be applied to surfaces that make up only a part of the total surface, 
as shown in figure 23. Under these circumstances the evaluation surface only encloses 
the section of the surface that is vibrating. 


S 

\ 



Figure 23: The evaluation surface encloses only the vibrating section of the surface 

Having determined that ESR was a promising reduction method, the authors implemented 
the exhaustive method using the reduced number of equivalent sources. Similar to the 
checks on the exhaustive method, two comparisons were made. The pressure at a 
microphone in space was calculated through the exhaustive method and the equivalent 
source method, and these two pressures were compared. (Note: the equivalent source 
schematic is the same as the exhaustive method, except that the number of input sources, 
N, is reduced). These two methods were also used to find and compare the binaural 
signal at the left ear of a KEMAR head. 

Figure 24 shows a far-field comparison of the pressure spectra at a microphone in space. 
Notice that the 8 by 8 grid of replacement sources matches the original 23 by 23 grid of 
sources at frequencies up to 980 Hz. This is expected because the source spacing 
(0.176m) is equivalent to half an acoustic wavelength (tuc/to) at 980 Hz. 
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Figure 24: Pressure at observer calculated through 
exhaustive and ESR methods 


Notice that in the near-field (figure 25) the equivalent source method is not as accurate. 
The two pressure signals will diverge even further after HRTFs are applied in order to 
calculate the binaural signals. 



Figure 25: Pressure at close observer calculated through 
exhaustive and ESR methods 
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When only 16 equivalent sources (figure 26) are used, this method is accurate up to 490 
Hz. Again, this is because the source spacing is equal to half the acoustic wavelength at 
490 Hz. 


z Position of observer [0.6, 0.6, 0.4] 1m from center 
ro 80| i| r 

D. i 



H 0 500 1000 1500 
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Figure 26: Pressure at observer calculated by 
exhaustive and 16 source ESR methods 

The following figure (27b) shows the frequency domain comparison of the pressure at the 
left ear of a listener (configuration in figure 27a). Notice that with 64 sources, the ESR 
method is very close to the exhaustive method up to about 1 000 Hz. 



Figure 27a: Head orientation 
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Figure 28 depicts the time domain version of figure 27b (low-pass filtered at 800 Hz). 
Cross-correlation of these two signals yields a correlation of 98.4%. 
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Figure 28: Calculated pressure at left ear 

When only 16 sources are used, the frequency domain version of the left binaural signal 
(configuration in figure 27a) that was calculated using the ESR method is accurate up to 
490 Hz (figures 29 and 30). 



0 50 100 150 200 250 300 350 400 450 500 

Frequency (Hz) 

Figiue 29: Left Binaural Signal in Frequency Domain 


26 





In the time domain (with a low-pass filter at 400 Hz), the correlation is found to be 98%. 



Figure 30: Left Binaural Signal in Time Domain 


Singular Value Decomposition 

A third reduction method investigated was singular value decomposition [23]. By 
breaking down the matrix of HRTFs into three separate matrices, the number of 
convolutions can be greatly reduced. Since convolution is computationally expensive, a 
great deal of processing time can be avoided by a small amount of pre-processing. This 
discussion will describe how singular value decomposition works, how it applies to the 
exhaustive method, and the comparisons that were made to verify that the SVD method is 
accurate. 


Singular value decomposition is simply separating a matrix (in this case a matrix of 
HRTFs) into three separate matrices: U, Z, and V (equation 18). 

HRTFs = U*'L*V'^ (18) 


U is a matrix of left singular vectors, Z is a diagonal matrix of singular values, and Visa 
matrix containing right singular vectors. The relative sizes of these matrices are visible 
in figure 31. It is important to note that singular value decomposition of the HRTF 
matrix can be done in either the time or frequency domain, but for the following analysis, 
the SVD was performed on a matrix of HRTFs in the time domain. 
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Figure 3 1 : Singular Value Decomposition of the HRTF matrix 

The original matrix of HRTFs for the left ear at 0° elevation is shown below in figure 32. 
Notice that when the sound approaches the left ear directly (angle between directly ahead 
and the approach of the sound is 90°) the sound is loud and contains high frequency 
content. When the sound comes from the other side of the head (-90°), the sound level is 
lower (due to the IID), there is a time delay (ITD), and the signal contains mostly low 
frequencies because higher frequencies are shadowed by the head. 



Figure 32: Time Domain HRTFs for 0° elevation 


The first step taken in the SVD method was to create a function to describe the time delay 
of the HRTFs for each angle and remove that time delay from each impulse response in 
the matrix of HRTFs (figure 33). This time delay removal causes the matrix of HRTFs to 
be a smoother angular function so that the singular value decomposition can be 
performed with greater accuracy. Note that this operation is not necessary for HRTFs 
having a minimum phase representation. The time delay removed in this operation is 
added to the retarded time delay (r/c) in the radiation model. 
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Figure 33: Time domain HRTFs with ITD removed 

The singular value decomposition of the FIRTF matrix (without time delays) is shown 
below in figure 34. Notice that the left singular vector matrix contains angular 
information for each singular value a,. The matrix of singular values is simply a diagonal 
matrix of singular values, which diminish in value as / becomes larger. Time information 
per singular value is contained in the right singular vector matrix. When these three 
matrices are multiplied together, the matrix of HRTFs (without time delays) will be the 
result. The advantage of SVD is that an approximate HRTF matrix can be obtained using 
only a few singular values. This will become important as the SVD process is applied to 
the exhaustive method. 



t 


Figure 34: SVD of time domain HRTFs 

Recall the schematic of the exhaustive code in figure 8. Shown below is a similar 
schematic of the SVD reduction method (figure 35). Note that instead of simply 
convolving the HRTFs with the radiated pressure for each of the N sources, the SVD 
method expands the application of the HRTFs into four faster operations that are 
performed for each of M important singular values. The first operation is a scalar 
multiplication (radiated pressure times U(i) for i = l :M) for each of the N sources. The 
signals are then summed and convolved only M times with the right singular vectors (V(i) 
for i = 1 :M). Each filtered signal is then multiplied by its singular value, a, and then 
summed to produce the left ear signal (a similar process is carried out for the right ear 
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signal). Notice that this method is more complicated than the exhaustive method and 
there is no advantage to the SVD method unless M is much smaller than N. 



Figure 35: Schematic of the SVD method 

So how many singular values really are important? Depending on how the HRTFs are 
pre-processed before SVD is performed, the value of the singular values will drop off 
slower or faster. Figure 36 shows a plot of the singular values under four different 
circumstances. In the first case, the SVD is executed on the FlRTFs over the entire 
frequency range. In the second case, the FlRTFs are logarithmically weighted according 
to frequency before performing the SVD. This applies more weight to lower frequencies 
since humans detect sound on a logarithmic scale. After performing the SVD, the 
logarithmic weighting is removed. The third scenario applies logaritlimic weighting and 
low-pass filters the sound at 2000 Flz. Lastly, the sound is logaritlimically weighted and 
low-pass filtered at 5000 Hz. Notice that the level of the singular values falls more 
sharply for smaller frequency ranges. This makes logical sense because it takes fewer 
singular values to reproduce a signal that contains less frequency information. The 
number of important singular values, then, is dependent upon the criterion used. The 
authors selected a 25 dB criterion, choosing to use three singular values. The real 
criterion in selecting the number of important singular values is the correlation between 
the SVD method and the exhaustive method, which will be discussed next. 
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Number of Singular Values, M 
Figure 36: Number of important singular values 

Figure 37b depicts the transfer function from the input force to the left ear (as shown in 
figure 37a) and compares this transfer function as it is calculated with the exhaustive and 
SVD methods (three singular values were used). Note that there is very good agreement 
between the two and that any discrepancies occur at low dB. A more concrete 
comparison can be made in the time domain, however. 




Figure 37a: Head orientation Figure 37b: Calculated pressure at left ear 
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Figure 38 shows the same transfer function from the input force to the left ear as shown 
above, but in the time domain. Since the data is only good up to 2000 Hz, a low-pass 
filter at 1600 Hz was applied. The correlation between the SVD (only 3 singular values 
used) and exhaustive methods was a remarkable 97.7%. From this result the authors 
concluded that singular value decomposition is an accurate reduction method and can be 
effectively used in this research. 
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Figure 38: Impulse response between input force and left ear 


ESR and SVD Together 

Since the ESR method works to reduce the number of input sources while the SVD 
method changes how HRTFs are applied, the two methods can be used simultaneously to 
yield an even greater reduction in computation time. Figure 39 shows a time domain 
comparison of the combined reduction methods and the exhaustive method. In this case, 
three singular values and an eight by eight grid of equivalent sources were used. Because 
the ESR method for an eight by eight grid is only accurate up to 980 Hz, a low-pass filter 
at 800 Hz was applied. A correlation of 97% was found between the combined reduction 
method and the exhaustive method. This level of accuracy is sufficient to conclude that 
the combined reduction method can be effectively used in this research. 
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Figure 39: Impulse response between input force and left ear 


Computation Time 

Now that it has been verified that the ESR, SVD, and combined reduction methods can 
accurately replace the exhaustive method, it remains to determine if these methods 
actually reduce the number of computations in this analysis. To handle this problem, the 
authors developed an estimate of the computation time required for each method that is 
based solely upon the number of additions and multiplications used in each method. 


The schematic of the exhaustive method is shown below in figure 40. Since the radiation 
model process is scalar multiplication, the number of computations is modeled to be N 
multiplications. Convolution is required in the process of filtering the radiated pressure 
with the HRTFs, so the number of computations required will be 12S*N multiplications 
plus 128*7V additions. Summing the N components of the left ear signal is simply N 
summations. In this model, addition and multiplication are estimated to require 
approximately the same amount of computation time. Adding up the number of 
summations and multiplications required by the exhaustive method yields a total of 
258*N computations. In the case of the original plate, with a 23 by 23 grid of 
measurement points, the total number of computations would be 136,482. 
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Figure 40: Computation time schematic of the exhaustive method 


In the case of the ESR method, the schematic is the same as in figure 40. The difference 
is found in the number of sources (N). For a model with 64 sources (which is accurate up 
to 980 Hz), the number of computations will be 16,512. If only 16 sources are used, the 
number of calculations would be reduced to 4128, but the data is only accurate up to 490 
Hz. There is an obvious reduction in computation time achieved using the ESR method, 
but there is a tradeoff involving the frequency range of accuracy. 


The reduction in computation time achieved with the SVD method is less intuitive 
because the system is more complex (figure 41). The computations required by the 
radiation model, though, are the same as with the ESR method: N multiplications. 
Multiplying the left singular vector, U(i), requires N multiplications for each of the M 
singular values, which is M*N multiplications. By the same reasoning, the summing 
process requires M*N summations. Since the convolution occurs after the summing 
process in the SVD method, only M*128 multiplications and M*128 additions are 
required. The scalar multiplication of the singular values is not included in the number of 
computations because it can be multiplied by the right singular vectors in pre-processing. 
The total number of computations is AI + 2MN+ 251 M, which is 4474 computations if 
three singular values and 529 input sources are used. 
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Figure 41 : Computation time schematic of the SVD method 

If ESR is combined with the SVD method, the number of input sources can be reduced to 
64 sources, for example. In that case, the total number of computations would be 1219. 
When compared with the 136,482 computations required by the exhaustive method, 

(even though this model is only an approximation) it is safe to conclude that the ESR, 
SVD, and combination methods all produce significant reduction in the required number 
of computations and in turn, processing time. Table 1, below, offers a comparison of the 
computation time associated with each method. 


Table 1 : Computations required by exhaustive and reduction methods 


Method 


N 



Exhaustive 

N/A 

529 

N/A 

136482 

ESR 

N/A 

64 

980 

16512 


N/A 

16 

490 

4128 


7 

529 

N/A 

9734 

1 ™ 

3 

529 

N/A 

4474 


7 

64 

980 

2759 

ESR & SVD 

7 

16 

490 

1911 

ESR & SVD 

3 

64 

980 

1219 

ESR & SVD 

3 

16 

490 

883 
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This reduction in computation time is important in order to create a real-time binaural 
simulation of structural acoustic data. In a real-time virtual environment, the user’s head 
orientation is measured by a head-tracking unit and fed into the computer. Typical 
update rates for head-tracking units are 100 Hz corresponding to one update every 10 ms. 
Until the computer receives another head orientation from the head-tracking unit, the left 
and right ear signals are calculated based upon the current head orientation. However, 
the audio and visual outputs are also based on other inputs. For instance, if the 
simulation were of a jet flying overhead, the sound input from the simulated jet would be 
updated at a sample rate of 44100 samples/second. This means that in order to create a 
structural acoustics simulation using the exhaustive method, the computer must make an 
estimated 136,482 calculations per ear in about 23 |is. With the combined reduction 
methods, only 883 calculations are required per ear in the same amount of time. This is 
an estimated computation time reduction of 99.4%, making real-time binaural analysis of 
structural acoustic data a possibility. 


Conclusion 

The goal of this research has been to reduce the computation time required in calculating 
the binaural signals to reproduce the sound radiated from a vibrating structure. This was 
accomplished by developing an exhaustive method to calculate the binaural signals and 
by verifying that method through a comparison of the results with actual data. Several 
reduction methods were investigated and the ESR and SVD methods showed exceptional 
promise. These two methods were tested by implementing the necessary changes to the 
computer code for the exhaustive method. The results were compared with those in the 
exhaustive method and found to have high correlations. In addition, the two reduction 
methods can also be combined since each method’s changes affect different portions of 
the calculation process. The only limiting factor with either method is that the ESR 
method has a frequency range of accuracy that depends upon the spacing of the 
equivalent sources. After the accuracy of the reduction methods was determined, the 
computation time required by each method was estimated. It was found that the ESR, 
SVD, and combined reduction methods all accurately replace the exhaustive method for 
calculating structural acoustic binaural signals while significantly reducing the estimated 
computation time. 


Future Work 

While there is excellent correlation between the results from the reduction methods and 
the exhaustive methods in objective tests, this analysis could be supported in the future 
with auditory tests. Some preliminary tests have been performed, but more scientific 
tests involving several subjects would enhance the results of this research. In addition, 
the effects of an acoustic enclosure and more complex geometries have yet to be 
investigated. 
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Another improvement to this research would be to use individualized HRTFs instead of 
generalized ones. The largest amount of error introduced into each method is probably 
the use of generalized HRTFs, because each person has differently shaped heads and 
torsos. Using individualized HRTFs would minimize that error, allowing listeners in 
auditory tests to distinguish small errors. Since the reduction methods cause smaller 
errors than that of the generalized HRTFs, this would greatly improve the validity of 
auditory tests. 
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ESR method reduces the number of sources to be analyzed in real-time computation by replacing sources on the 
scale of a stractural wavelength with sources on the scale of an acoustic wavelength. It is shown that the 
effectiveness of the SVD and liSR methods improves as the complexity of the source increases. In addition, 
preliminary auralization tests have shown that the results from both the SVD and ESR methods are 
indistinguishable from the results found with the exhaustive method. 
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